Skip to content

Fix unfinished step in parallel flow #4567

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

doontagi
Copy link
Contributor

@doontagi doontagi commented Mar 24, 2024

related issue #3939

Motivation

Modification

  • Changed to wait for all tasks to finish before throwing an exception.
  • Changed a job to handle exceptions after waiting for tasks currently being processed to complete. Prevent the job from terminating �even though there are remaining tasks.

Result

@doontagi doontagi force-pushed the fix-unfinished-step-in-parellel-flow branch from f101e58 to b8a1ba8 Compare March 24, 2024 14:23
}
}
}

if (!exceptions.isEmpty()) {
throw exceptions.get(0);
Copy link
Contributor Author

@doontagi doontagi Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As before, the first caught exception would be thrown.

jobBuilder.preventRestart().build().execute(execution);

boolean isExecutedNonExecutableStep = countDownLatch.await(1, TimeUnit.SECONDS);
assertEquals(BatchStatus.STOPPED, execution.getStatus());
Copy link
Contributor Author

@doontagi doontagi Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished with BatchStatus.STOPPED without executing nonExecutableStep.
This test would fail with executing nonExecutableStep if it runs on the current main branch.

@doontagi
Copy link
Contributor Author

Gentle ping to @fmbenhassine. Could you review this PR? I've fixed #3939.
Thanks!

@fmbenhassine
Copy link
Contributor

Thank you for your PR!

Changed to wait for all asynchronous tasks to finish before throwing an exception.

Isn't that not desired when one of the tasks is interrupted? Meaning if one wants to interrupt a step, the idea is not to wait for other steps to finish.

The change in this PR fixes the reported issue, I just want to make sure there is no side effect on the behaviour of step interruption.

@fmbenhassine fmbenhassine added the status: waiting-for-reporter Issues for which we are waiting for feedback from the reporter label Apr 23, 2024
@doontagi
Copy link
Contributor Author

doontagi commented Jul 14, 2024

Thank you for the comment! And sorry for the late reply. @fmbenhassine

I've also thought about your concern. And I don't think this change of exception handling location affects the current process of handling interruptions in steps, so has no side effect.
Because even if an interruption occurs in one of the steps being processed at the same time, the other step may recognize the interruption only after its execution is finished.


Interruption only can be recognized in catch or final blocks after doExecute has been done.

Therefore, even if interrupt occurs in a step, other steps process its executions until the end and check interruptions from other steps regardless of my modification. In the virtue of my modification, other steps can recognize the interruptions because job is not STOPPED yet.

The difference is that the current code immediately propagates the exception generated by interrupt and terminates the job (STOPPED) without caring of other steps. Therefore, the remaining flows and steps cannot recognize the interrupt because it recognize the interrupt by STOPPING status of the job

private void checkForInterruption(StepExecution stepExecution) {
JobExecution jobExecution = stepExecution.getJobExecution();
jobExecutionDao.synchronizeStatus(jobExecution);
if (jobExecution.isStopping()) {
logger.info("Parent JobExecution is stopped, so passing message on to StepExecution");
stepExecution.setTerminateOnly();
}
}

and continue to be executed with STOPPED status of their parent job. It is a cause of the bug.

In conclusion, there is no change from the view of steps, because in the current code the step execution is also not stopped even if an exception caused by interrupt occurs in another step. Interrupt is caught after step execution has been done. And I think Job shouldn't be stopped if there are running flows and steps.

So I think my changes will prevent other unknown bugs that can occur as flow and step continue to run with job stopped.

Sorry for the lengthy explanation. if you have any other opinions, please leave a comment for more discussions. Thank you!

@fmbenhassine fmbenhassine added status: feedback-provided Issues for which the feedback requested from the reporter was provided and removed status: waiting-for-reporter Issues for which we are waiting for feedback from the reporter labels Sep 2, 2024
@doontagi
Copy link
Contributor Author

Gentle remind @fmbenhassine. Could you check this PR?
Thanks!

@fmbenhassine
Copy link
Contributor

Thank you for your feedback! I saw your comment back then and added a thumbs-up emoji and the status: feedback-provided tag to acknowledge your update. I should have mentioned the release in which i was planning to include this to avoid you waiting for an update from my side. Anyway, this looks good to me and should go into 5.2.1 planned for tomorrow.

Thank you for going into details with your last update! I love how you explained the step interruption process and how this change should not have a side effect 👍

@fmbenhassine
Copy link
Contributor

Rebased and merged as 7678949. Thank you for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: core pr-for: bug status: feedback-provided Issues for which the feedback requested from the reporter was provided
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Different behavior when interrupting a job, depending on the parallel flow order
2 participants